Fast model selection based speaker adaptation for nonnative speech

نویسندگان

Xiaodong He

Yunxin Zhao

چکیده

In this paper, the problem of adapting acoustic models of native English speech to nonnative speakers is addressed from a perspective of adaptive model complexity selection. The goal is to dynamically select model complexity for each nonnative talker so as to optimize the balance between model robustness to pronunciation variations and model detailedness for discrimination of speech sounds. A maximum expected likelihood (MEL) based technique is proposed to enable reliable complexity selection when adaptation data are sparse, where expectation of log-likelihood (EL) of adaptation data is computed based on distributions of mismatch biases between model and data, and model complexity is selected to maximize EL. The MEL based complexity selection is further combined with MLLR to enable adaptation of both complexity and parameters of acoustic models. Experiments were performed on WSJ1 data of speakers with a wide range of foreign accents. Results show that the MEL based complexity selection was feasible when using as little as one adaptation utterance, and it was able to dynamically select proper model complexity as the adaptation data increased. Compared with the standard MLLR, the MEL + MLLR method led to consistent and significant improvement to recognition accuracy on nonnative speakers, without performance degradation on native speakers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition

Nonnative speech recognition is becoming more and more important as many speech applications are deployed world wide. Meanwhile, due to the large population of nonnative speakers, speaker adaptation remains the most practical way for providing high performance speech services. Subspace Gaussian Mixture Model (SGMM) has recently been shown to yield superior performance on various native speech r...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

متن کامل

Speech recognition for multiple non-native accent groups with speaker-group-dependent acoustic models

In this paper, the recognition performance for non-native English speech with two different kinds of speaker-groupdependent acoustic models is investigated. The approaches for creating speaker groups include knowledge-based grouping of non-native speakers by their first language, and the automatic clustering of speakers. Clustering is based on speakerdependent acoustic models in speaker Eigensp...

متن کامل

Connectionist speaker normalization and adaptation

In a speaker-independent, large-vocabulary continuous speech recognition systems, recognition accuracy varies considerably from speaker to speaker, and performance may be significantly degraded for outlier speakers such as nonnative talkers. In this paper, we explore supervised speaker adaptation and normalization in the MLP component of a hybrid hidden Markov model/ multilayer perceptron versi...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Trans. Speech and Audio Processing

دوره 11 شماره

صفحات -

تاریخ انتشار 2003

Fast model selection based speaker adaptation for nonnative speech

نویسندگان

چکیده

منابع مشابه

A Two-stage Speaker Adaptation Approach for Subspace Gaussian Mixture Model based Nonnative Speech Recognition

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

Speech recognition for multiple non-native accent groups with speaker-group-dependent acoustic models

Connectionist speaker normalization and adaptation

عنوان ژورنال:

اشتراک گذاری